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Abstract: Breast cancer remains the leading cause of cancer-related mortality in women. 
Comprehensive genomics, proteomics, and metabolomics studies are emerging that offer an 
opportunity to model disease biology, prognosis, and response to specific therapies. Although 
many biomarkers have been identified through advances in data mining techniques, few have 
been applied broadly to make patient-specific decisions. Here, we review a selection of breast 
cancer prognostic indicators and their implications. Our goal is to provide clinicians with a 
general evaluation of emerging computational methodologies for outcome prediction. 
Keywords: computational model, precision prognosis, tumor 

Introduction 

One in eight women develops breast cancer, the most common cause of malignancy 
in females. Although the majority of patients now survive for many years after initial 
diagnosis and therapy, a significant subpopulation remains at risk of metastatic relapse. 
These women have a median survival time of less than 1 .5 years at the time of relapse, 1 
with only 10% expected to survive 10 years after diagnosis. 2 

The ability to classify patients into clinically relevant subgroups to allow for precise 
therapy is urgently needed. Traditional tumor stratification is based largely on morphol- 
ogy, but is relevant in less than one-quarter of invasive breast carcinomas. 3 However, 
novel computational models are emerging as powerful tools to address this deficiency. 
Recent work has integrated critical observations in the biology of breast cancer, includ- 
ing gene deletions, translocations, and locus amplification; 4 - 5 biomarkers from high- 
throughput "-omics" technologies such as genomics, proteomics, and metabolomics; 
and long recognized outcome variables such as tumor size, histologic grade, axillary 
nodal status, and estrogen receptor (ER) status. We anticipate that these biomarkers 
will emerge as an effective molecular classification or guide for the determination of 
prognosis and the development of tailored therapy (reviewed by Gruver et al 3 ). 

However, much work regarding validation of these approaches is still required. 
For example, single biomarkers have not proven highly informative in most women. 
Somatic mutations with the potential ability to act as a biomarker were noted to 
occur in single genes (eg, TP53, PIK3CA, and GATA3) in <10% of 825 patients with 
primary breast cancers. 6 Similarly, only three multi-gene signatures are incorporated 
into current clinical practice. 7-9 Effective and clinically applicable prognostic indicators 
will also require the ability to address the emerging importance of the intra-tumoral 
heterogeneity of breast cancer. 10 
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This review will not address specific clinical prognostic 
variables as they have been reviewed extensively by 
others. 11-14 Instead we have chosen to evaluate emergent 
prognostic indicators for breast cancer, focusing on compu- 
tational methodologies and strategies to identify a molecular 
indicator (a biomarker or set of biomarkers). We are interested 
particularly in statistical and computational methodologies 
based on cutting-edge "-omics" technologies, moving from 
the conventional single oncogene/tumor suppressor gene 
strategy to one that exploits advances in systems biology. 
To achieve our goal, we performed a systematic search of 
all the English language literature regarding computational 
breast cancer prognosis, using a MEDLINE search for the 
period from January 2005 to July 2013. This review will 
begin with the utility of single molecular biomarkers and 
multi-gene signatures, followed by an evaluation of novel 
system-biology based analyses that infer information from 
high-throughput "-omics" data. Importantly, we will discuss 
exciting possibilities in the development of new integrative 
methodologies. 

Identification of prognostic 
molecular signatures 

Examination of genetic variation, genomic association, 
or transcriptomic alterations in large study cohorts allows 
the selection of single or multiple prognostic markers that 
can stratify patients into groups showing distinct outcomes 
(Table SI). It may also allow the specific tailoring of care. 
Sample size and biological hypotheses are distinguish- 
ing features of prognostic marker studies. 1516 In terms of 
sample size, a practical clinically based prognostic index 
for patients with metastatic breast cancer was recently 
proposed by a retrospective analysis of 2,322 patients after 
primary treatment. 2 In parallel, others have focused on using 
biological determinants. For example, the Cancer Genome 
Atlas Network has incorporated the four major breast 
cancer subtypes with five genetic and epigenetic factors 
(genomic DNA copy numbers, DNA methylation, exome 
and messenger RNA expression, microRNA sequencing, 
and reverse-phase protein expression). 6 The results are 
promising for understanding the biological underpinning 
of subtype-specific prognostic indicators. However, there 
remains a critical gap between an integrative model of 
"omics"-data and clinical variables. Describing biological 
pathways underlining clinical biomarkers in a large sample 
size could provide the opportunity to significantly improve 
our understanding of the biology and heterogeneity of breast 
cancers. For instance, the enrichment patterns of gene sets 



associated with embryonic stem cell (ESC) identity in the 
expression profiles of various human tumor types is associ- 
ated with poor outcome in breast cancer. 17 

Single gene prognostic determinants 

For specific tumor subtypes, the expression of single critical 
genes can serve as prognostic indicators (reviewed by Adam 
Maciejczyk). 18 Germ-line mutations of BRCA1 are exten- 
sively used for early detection of familial breast cancer, and 
are predictive for 15%— 20% of women with a family history 
of breast cancer, and 60%-80% of patients with combined 
breast and ovarian cancer. 19 Additional prognostic factors 
include enhanced RAD21 and cohesin expression, which has 
been associated with resistance to chemotherapy in high-grade 
luminal, basal, and HER2 breast cancers. 20 Transcription fac- 
tor muscle segment homeobox 2 (Msx2) expression has been 
implicated in an increased likelihood of tumor cell death via 
apoptosis in invasive breast cancers. 21 Enhanced expression 
of the anterior gradient-2 (AGR2) protein, which occurs in the 
presence of ER antagonist tamoxifen, confers poor prognosis 
in ER-positive breast cancers. 22 

The most common computational method to assess these 
critical biomarkers is the hazard Cox regression model. 23 
Given a time after diagnosis for values of the predictor 
variables, the model produces a survival function for the 
probability that the binary event of interest (eg, death or 
survival at the endpoint) occurs. In this context, additional 
computational models to improve the prediction have been 
proposed including Bayesian network analysis evaluating 
probabilistic relationships among candidate genes 24 and sup- 
port vector machine methodologies. 25 A review of standard 
survival analyses and the use of the outstanding Bioconductor 
tool suite is available at http : //cran . r-proj ect . org/ web/ views/ 
Survival.html . 

Current status of multi-gene 
prognostic determinants 

Multi-gene signatures are most often derived from transcrip- 
tomic microarray and sequence data. 26-28 These signatures 
not only have potential for classification and prognosis, 29 but 
they can also predict specific tumor sub-phenotypes such 
as resistance to radiation. 3031 Generally, these multi-gene 
models can classify patients into subgroups with either 
distinct outcomes or diverse treatment responses in an 
unsupervised manner. As an example, the DNA content of 
breast adenocarcinomas can be classified as either stable, 
conferring good prognosis, or unstable, conferring poor 
prognosis. 32 
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To identify multi-gene signatures or pathways, com- 
putational pattern learning algorithms have been success- 
fully applied to transcript or mass spectrometry profiles. 
These algorithms can be roughly categorized into three 
groups: 1) unsupervised data mining, eg, hierarchical 
clustering, 7 33 topographic projection, 34 and other method- 
ologies; 2) supervised classifiers, eg, decision trees, 35 and 
Prediction Analysis of Microarray (PAM); 9 and 3) semi- 
supervised learning models. 36 In supervised algorithms, 
sample labels such as good- or poor-outcome are required 
to train the model before making decisions. In contrast, 
unsupervised algorithms are data-driven. In a semi- 
supervised model, the algorithm makes decisions based 
on both the raw data and the input labels, allowing partial 
labeling. Which method to choose depends on study pur- 
pose - for example, if a certain phenotype is known to be 
an important factor in identifying the multi-gene signature, 
one should select supervised methodologies to make use 
of that information. 

Clinical application of prognostic 
determinants 

In terms of single determinants, the expression levels of four 
genes (ER, PR, HER2, and Ki67), and combinations thereof, 
have been shown to have a strong prognostic impact (reviewed 
by Gokmen-Polar et al 37 ). Missense mutations, eg, within the 
tumor suppressor gene p53 or increased levels of urokinase- 
type plasminogen activator (uPA) and/or plasminogen 
activator inhibitor-1 (PAI-1), which indicates poor clinical 
outcome, have been included by the American Society of 
Clinical Oncology 2007 Update of Recommendations for the 
Use of Tumor Markers in Breast Cancer. 38 Other single-gene 
determinants with somatic mutations are sensitive to targeted 
therapy; for instance, active ESR1 mutations in ER-positive 
metastatic breast cancer. 39 

Several multi-gene-based commercial prognostic test- 
ing methodologies are now available. The MammaPrinf* 
70-gene signature assay (Agendia, Inc., Irvine, CA, USA) 
stratifies patients' outcomes and allows personalized 
therapeutic prediction. 7,40,41 Two additional clinical decision- 
making assays, the Oncotype DX 2 1 -gene assay 42 (Genomic 
Health, Inc. (Redwood City, CA, USA) and a clinically 
updated version of the intrinsic subtype PAM50 assay 43 
(ARUP Laboratories, Salt Lake City, UT, USA), have been 
compared. Predicting prognostic intrinsic subtypes among 
1 5 1 patients, investigators observed good agreement between 
the 21 -gene and PAM50 assays for high and low prognostic 
risk assignment. 44 



Overall, molecular prognostic determinants have been 
successfully developed as adjuvant tools for innovative 
diagnostic, prognostic, and therapeutic approaches. As shown 
by Albain et al, 45 changes in treatment decision after review 
of multi-gene signatures occurred in 30% of individuals 
predominantly from chemotherapy plus endocrine therapy 
to endocrine therapy alone with an associated diminution in 
treatment-related toxicity. 

However, there are several limitations in the utility of 
either single gene or multi-gene prognostic signatures in 
routine practice, including requirement of fresh or frozen 
tissues to measure uPA/PAI-1 and deliberate measurement 
compared with a gold standard or "housekeeping genes" for 
single gene determinants. Another limitation for both single 
and multi-gene signatures is that they are positive in only a 
specific subtype of breast cancer. For instance, the 70-gene 
assay can only identify potential chemotherapy benefits in 
high-risk patients. 45 In another study, the discordance rate 
was approximately 30% between the clinicopathologic risk 
categories given by the 70-gene assay and the 2 1 -gene assay. 45 
Both assays are particularly limited in assigning high-risk 
status to ER-negative patients. 46 

How do we overcome these challenges? 

Key issues with current multi-gene signature methodologies 
include specificity of prediction and standardization across 
diagnostic platforms. Li et al have reported that many 
randomly selected genes show predictive power for cancer 
prognosis in one dataset, while losing predictive power in 
other datasets. 47 Therefore, an evaluation using indepen- 
dent datasets is necessary to assess a prognostic model or 
indicator. Recently, Venet et al 28 compared 47 published 
breast cancer outcome signatures to mimic signatures made of 
random genes. Surprisingly, they found that 60% of reported 
gene signatures of identical size were equally predictive of 
"mimic" gene signatures. Of particular concern, 23% of the 
signatures were worse than random, and more than 90% of 
mimic signatures with 100 genes were significant outcome 
predictors. 28 Thus, the deficiencies associated with the repro- 
ducibility of multi-gene-based prediction prevent effective 
usage at this time. 

The complexity and discrepancy of single- or multi-gene 
signatures also precludes easy extraction of biologically 
and therapeutically relevant information. A clue may be 
provided by observations that suggest that although genetic 
alterations between patients differ, they frequently involve 
common pathways. It is therefore critical to identify relevant 
pathways involved in breast cancer progression and detect 
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the corresponding indicators that are prognostic in different 
tumor subtypes or patients. 

Novel strategies for prognostic 
indicator development 

Incorporation of network or pathway information into prog- 
nostic biomarker discovery could significantly improve pre- 
diction performance. For this purpose, investigators have used 
a large number of supervised machine-learning approaches 
(ie, inferring knowledge from data with labeled samples) to 
take advantage of prior knowledge. For example, data mining 
on gene expression profiling reveals an association between 


hyperactivity in the PI3K pathway, lowered ER levels, and 
resistance to endocrine therapy. 48 Unfortunately, there is no 
single "one-size-fits-all" algorithm, integrating accuracy, 
stability, and interpretability of gene selection. 49 Table 1 
lists methods to identify prognostic determinants based on 
gene expression in breast cancer. Two recent reviews discuss 
prognostication of tumor mutations using pathway and net- 
work analysis. 50,51 

Using biological hypothesis-based gene selection and 
interpretation, we and others have identified transcriptional 
prognostic indicators (Table 1). As prognostic biomarkers, 
these models calculate a score per sample and use a simple 


Table 1 Methods to identify prognostic determinants based on gene expression in breast cancer 


Data source 


Method 


Description 


Genome-wide gene expression 


RXA-GSP 53 

LDS 36 

MSS 47 

Correlation 56 
BCRSVM 25 

PGL 34 
PAM' 

Cox proportional-hazards 
regression modeling, gene-set 
enrichment analysis 17 
Bayesian network analysis 24 


Summarizes the individualized relative expression between biological 
experiment-defined gene-set pairs, thus tolerating the diverse noise and 
differences observed from multiple technologies and laboratories. 
This semi-supervised approach successfully employed unlabeled gene 
expression data and achieved significant performance in gene expression- 
based outcome prediction for cancer patients. 

Identifies prognostic markers that can be used in combination to stratify 
breast cancer patients into groups of different risk ranks with high accuracy. 
Correlation between two biomarkers is a more useful prognostic factor 
than their individual expressions. 

Uses modern machine-learning method SVM to train six clinical 
prognostic variables (histological grade, tumor size, number of metastatic 

■ | | ll_ | - - ■ | - - 

lymph nodes, estrogen receptors, lymphovascular invasion, local invasion 
of tumor, and number of tumors) into a prognostic model. 
A literature-proposed predictive gene list for breast cancer is 
benchmarked against a separate gene list to construct nonlinear 
topographic projection maps for prognosis. 

PAM together with other conventional methods was used to define gene 
expression-based "intrinsic" subtypes that showed prognosis. 
Based on a careful gene-set enrichment analysis, multiple gene-set 
signatures stratify samples into prognostic subgroups. 

Bayesian probability was employed in neural networks to model 
censored data. 


Gene expression, experiment- 
based gene signatures 


Expression levels relative to a 
baseline condition, hierarchical 
clustering, "leave-one-out" 
cross-validation 7 


Top genes were selected to distinguish subtypes of breast cancers that 
show prognosis. 


Gene expression, text mining 


eScience-Bayesian 5 ' 


Permits coherent integration of prior information and multiple data 
sources, such as gene expression and information derived from literature. 


Gene expression, clinical and 
genetic markers 


l-RELIEF 57 , an iterative method 
based on the feature selection 
algorithm called RELIEF 


Integrated clinical variables with gene expression or biological pathway. 


Gene expression, copy number 


iCIuster 58 


A likelihood-based, joint latent variable model for integrative clustering 
samples. 


Gene expression, copy number, 
pathway 


PARADIGM 660 


Integrates copy number, mRNA expression, and pathway interaction data 
into a personalized pathway-by-sample matrix that clusters patients into 
distinct prognostic subgroups. 



Abbreviations: BCRSVM, breast cancer recurrence prediction based on SVM; LDS, low density separation; MSS, multiple survival screening; PAM, prediction analysis of 
microarray; PARADIGM, pathway recognition algorithm using data integration on genomic models; PGL, predictive gene lists; RXA-GSP, relative expression analysis of gene 
set pair; SVM, support vector machine. 
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threshold value of 1 to dichotomize patients into prognostic 
groups. Pitroda et al piloted a tumor endothelium-derived 
inflammatory signature consisting of six genes that were asso- 
ciated with poor outcome in multiple cancers. 52 Similarly, in 
multiple adult tumors, we found that histologically poorly 
differentiated tumors display an ESC-associated expression 
imbalance, ie, preferential overexpression of targets of 
ESC-associated transcription factors (NANOG, OCT, SQX2, 
and MYC) combined with underexpression of Polycomb- 
regulated genes. 53 This finding is in agreement with the 
known significance of survival difference between patients 
with the expression pattern of these ESC-like signatures and 
other patients. 17 More recently, the coactivation of the ESC 
marker MYC and oncogene HER2 was associated with an 
acquisition of a self-renewal phenotype that is associated 
with poor outcome. 54 

However, the lack of standards for measurement across 
various studies, assays, and diagnostic platforms makes it 
difficult to translate individual findings into clear clinical 
applications. A popular solution is to compare the median 
scores of a calculated indicator in a measured cohort, 52 
whereas adding new patients changes the estimation of 
median score and thus the final decision. 

Relative expression analysis of gene-set pairs (RXA-GSP), 
is a novel methodology that we have pioneered to translate 
prognostic gene targets for individualized treatment planning. 53 
RXA-GSP is built on three principles: 1) each individual has 
both favorable and unfavorable prognostic factors for breast 
cancer. It is the imbalance that determines the individual 
outcome (Figure 1). For example, NANOG expression is an 
indicator of poor prognosis stimulating the growth and metas- 
tasis of breast cancer cells, whereas KLF4 is a favorable prog- 
nostic indicator inhibiting these processes. 55 2) The correlation 




Figure I Illustration of RXA-GSP method. 

Notes: This prognostic indicator is the ratio between scores (eg, expression values) 
of poor prognostic markers versus that of good prognostic markers. It has the 
ability to integrate different scales of data, bridging cancer biology with the clinic by 
employing both hypothesis-based and experimentally derived gene-set selection. 
Abbreviation: RXA-GSP, relative expression analysis with gene-set pairs. 



between two factors is, at least in some cases, more significant 
than the overall expression of either. For example, SIRT1 and 
DBC1 are both overexpressed in breast tumor tissue, but the 
correlation between their levels of expression is diminished. 56 
3) We employ a hypothesis-based and experimentally derived 
approach to the identification of candidates. 

Using RXA-GSP, a resultant indicator is computed as 
the optimal combination of candidate genes that was trained 
from several large cohorts, and validated in independent 
cohorts. 53 Considering the heterogeneity of breast cancer in 
each individual patient along with the relative expression of 
gene-set pairs and the correlations between them, RXA-GSP 
allows the computationally driven derivation of individual- 
ized prognostic indicators, bridging cancer biology with the 
clinic through gene expression analysis. 

Next generation integrative 
prognostic methods 

To date, the majority of studies focused on prognosis have 
integrated clinical variables with gene expression or bio- 
logical pathway data. 53 - 57,58 However, attractive opportunities 
are now available to integrate comprehensive genotype, 
metabolomics, and phenotype data. Success should facilitate 
robust prognostic and therapeutic prediction for each patient. 
Indeed, such success in this endeavor will have significant 
impact across biomedicine. 

However, this potential is limited by the challenges 
of the sheer size and complexity of high-throughput data 
resources, often resulting in significant imprecision in data 
usage. Classical statistical models are frequently insufficient 
for integrating "omics"-data with other resources, so specific 
problems are dealt with on a case-by-case basis due to the 
lack of a coherent overarching method of analysis. Indeed, 
we and others, have proposed the development of integrated 
breast cancer prognostic indicators to provide insights into 
significant variables. 53 5960 

Eklund et al 59 have addressed this problem through 
an eScience-Bayes approach, complementing a Bayesian 
probability models' ability to incorporate gene expression 
with text mining, modeling highly complex problems with 
high performance computing. Applying this model to several 
sets of independent gene expression data results in consis- 
tently accurate prediction of breast cancer metastases across 
cohorts. 59 Though such methods show promise for tailoring 
specific models to complex problems with high accuracy, 
their application is cumbersome, as they require manual 
selection of genes for which we have prior knowledge. This 
deficiency is related to a lack of a standard for displaying 
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scientific text in formats that are computationally readable. 
Another obvious weakness is that a microarray represents a 
single snapshot of the patient. 

New avenues of patient-specific tumor phenotyping are 
also facilitating prognosis prediction. These methodologies 
include the evaluation of risk associated with inherited genetic 
variation or genome-wide association studies (GWAS). 61,62 
Similarly, epigenetic changes in breast cancer cells (such as 
modifications in histone acetylation and DNA methylation 
at specific gene regulatory elements) have been implicated 
in breast oncogenesis. 663 These observations, coupled with 
the emerging use of epigenetic-modifying agents suggest an 
era of genomics-based therapy selection. 33,64 ' 65 For example, 
stromal/microenvironmental elements affect the prognosis, 
modifying tumor invasion and/or drug targeting. 66 - 67 

Thus, the effectiveness of interpretations of multi-gene 
assays likely depends on adjustment for other factors. 
A breakthrough work, Pathway Recognition Algorithm 
using Data Integration on Genomic Models (PARADIGM), 60 
integrates copy number, mRNA expression, and pathway 
interaction data to a personalized pathway-by-sample matrix 
that clusters patients into distinct prognostic subgroups. 
This integration algorithm has been recently applied to 
depict comprehensive molecular portraits of human breast 
tumors. 6 

In summary, we believe biological hypothesis-driven 
computational models that integrate current significant 
variables in multiple facets, including transcriptomic altera- 
tions, chromatin modification, and stroma response, will 
provide significant insight not only into the biology of breast 
cancer, but also into determining the prognosis of individual 
patients. 

Conclusion 

Challenges remain for breast cancer prognosis due to its 
complexity and the lack of standardization among models and 
"omics"-datasets. Specifically, what impedes the convenience 
and simplicity necessary for clinical application is the lack 
of precision molecular indicators that help individualized 
therapeutic decisions. Ideally, a prognostic indicator performs 
in a specific, sensitive, inexpensive, and easy manner. We 
expect an integrative computational model of "omics"-data 
and clinical variables to significantly improve our under- 
standing of the biology and heterogeneity of breast cancers. 
This is likely to be achieved through either enlarging the 
sample size or characterizing a specific subtype with given 
phenotypes. The latter has been recently recognized though 
controversial, hypothesizing a collection of genuinely variable 



malignances that happen to originate from breast epithelium. 68 
In conclusion, a prognostic indicator should clearly delineate 
the risk group for an individual, in terms of tumor growth, 
invasion, and metastatic potential, and/or the likelihood of 
response to a given therapeutic modality. 
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Table SI Selected instances from literature search 



1 st author 


Year 


Journal 


Pubmed 
ID 


Cancer biology 
underpinning 


Related clinical 
indicator 


Major contribution/conclusion 


van 't Veer 1 


2002 


Nature 


1 1823860 


Gene expression 
patterns 


Lymph node status 


Provides a powerful tool to tailor 
adjuvant systemic treatment that 
could greatly reduce the cost of BC 
treatment. 


Rennstam 2 


2003 


Cancer Research 


14695203 


Chromosomal copy 
number aberrations 




Patterns of copy number gains and 
losses define BCs with distinct 
clinicopathological features and 
patient prognosis. 


Paik 3 


2004 


New England 


1559 1335 


Gene expression 


Node-negative, ER- 


A novel recurrence score based on 






Journal of Medicine 




patterns 


positive, tamoxifen 
treatment 


21 genes to quantify the likelihood 
of distant recurrence in patients as 
well as overall survival time. 


Kronenwett 4 


2006 


Cancer 
Epidemiology, 
Biomarkers and 
Prevention 


16985023 


Genomic stability 




Objective classification of BCs into 
stable and unstable subtypes that are 
a prognostic indicator independent 
of established clinical factors. 


Bacac 5 


2006 


PLoS One 


17183660 


Stromal cells 




Human genes expressed in mouse 
stromal response to tumor invasion 
predicts BC patient survival. 


Suh 6 


2007 


Clinical Cancer 
Research 


17200346 


CLIC4 (chloride 
intracellular channel 4) 




Reactivation and restoration of 
CL/C4 in tumor cells or the converse 
in tumor stromal cells could provide 
a novel approach to inhibit tumor 
growth. 


Conlin 7 


2007 


Molecular Diagnosis 


18078353 


Oncotype DX 


Lymph node negative, 


The Oncotype DX assay and 






and Therapy 




recurrence score 
assay 


ER-expressing BC 


others aim to help improve risk 
classification and recurrence 
prediction and optimize selection of 
patients for adjuvant chemotherapy. 


Rodriguez 8 


2008 


Carcinogenesis 


18499701 


HOXB/3 (homeobox 
8/3) and 

ILI7BR (interleukin 
1 7 receptor B) 


Estrogen signaling 


Hypermethylation of HOXB/3 is a 
later event of tumor progression 
and a prognostic indicator of 
advanced BC. 


Wei' 


2008 


Molecular 
Carcinogenesis 


18176935 


H3K27me3 




Loss of H3K27me3 is a predictor of 
poor outcome in BCs. 


Kim 10 


2008 


Annals of Oncology 


17956886 


CDKs 


Patients recruited 
for study underwent 
mastectomy or breast- 
conserving surgery 


Tumors with high CDKISA and high 
CDK2SA showed significantly poorer 
5-year relapse-free survival than 
those with low CDKISA and low 
CDK2SA, respectively. 


Han" 


2008 


Nature 


18337816 


SATB/ 

(SATB homeobox 1 ) 




SATB/ is a genome organizer that 
tethers multiple genomic loci and 
recruits chromatin-remodeling 
enzymes to regulate chromatin 
structure and gene expression. 


Ben-Porath 12 


2008 


Nature Genetics 


18443585 


Stem cell genetic 

expression 

signatures 




Detailed characterization of the 
stem-cell regulatory networks active 
in cancer is likely to yield powerful 
diagnostic and prognostic markers. 


Parker 13 


2009 


Journal of Clinical 


19204204 


A 50-gene set 


"Intrinsic" subtypes, 


The intrinsic subtype and risk 






Oncology 




(PAM50) 


pathologic staging, 
histologic grade 


predictors based on the PAM50 
gene set adds significant prognostic 
and predictive value. 
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1 st author 


Year 


Journal 


Pubmed 
ID 


Cancer biology 
underpinning 


Related clinical 
indicator 


Major contribution/conclusion 


Sung 14 


2010 


Cancer Science 


204121 17 


SIRTI (sirtuin 1) and 
CGAR2 (also known 
as DBCI, deleted 
in breast cancer 1 , 
KIAAI967) 


Luminal subtype, ER 
and PR expressions 


Correlation between SIRTI and 
DBCI is a more useful prognostic 
factor than their individual 
expressions. Correlation between 
the two is decreased in tumor cells. 


Gevensleben 15 


2010 


International Journal 


21042777 


70-gene 


Size, age, histological 


Gene signature MammaPrint® 






of Molecular 




expression profile 


grade, hormone 


is shown to provide additional 






Medicine 




MammaPrint® 


receptor status, 
peritumoral vascular 
invasion and H£R2 
status 


independent prognostic information. 


Lanigan 16 


2010 


Breast Cancer 
Research 


20682066 


Msx2 (msh homeobox 2) 




Msx2 expression results in improved 
outcome for BCs, possibly by 
increasing the likelihood of tumor 
cell death by apoptosis. 


Creighton 17 


2010 


Breast Cancer 
Research 


20569503 


PI3K pathway 


Luminal ER + breast 
tumors 


Luminal B tumors have hyperactive 
GFRIPI3K signaling associated with 



lower ER levels, which has been 
correlated with resistance to 
endocrine therapy. Targeting PI3K 
in these tumors may reverse loss 
of ER expression and signaling and 
restore hormonal sensitivity. 



Xu 18 


201 1 


Breast Cancer 
Research 


21255398 


RAD2I (RAD2I 
homolog [S. pombe]) 


N/A 


RAD2I expression confers poor 
prognosis and resistance to 
chemotherapy in high-grade luminal, 
basal, and HER2 BCs. 


Littlepage" 


2012 


Cancer Discovery 


22728437 


ZNF2 1 7 (zinc finger 
protein 2 1 7) 


Amplification of the 
human chromosomal 
region 20q 1 3 


ZNF2I7 (amplified in numerous 
cancers) is a poor prognostic 
indicator and therapeutic target 
in patients with BC and may be 
a strong biomarker of triciribine 
treatment efficacy in patients. 


Pitroda 20 


2012 


PLoS One 


23056240 


Endothelial 

inflammatory 

pathways 




The first prognostic cancer 
gene signature derived from an 
experimental model of tumor- 
associated endothelial inflammation. 


Kim 21 


2012 


Journal of Breast 
Cancer 


22807942 


Gene expression 
patterns 


Histological grade, size, 
number of metastatic 
lymph nodes, ER, 
lymphovascular 
invasion, local invasion 
of tumor, and number 
of tumors 


As the selected prognostic factors 
can be easily obtained in clinical 
practice, the proposed model might 
prove useful in the prediction of BR 
recurrence. 


Faryna 22 


2012 


FASEB Journal 


22930747 


Aberrant DNA 
methylation 


Low-grade ER- and/or 
PR-positive BC 


Early methylation changes are 
frequent in the low-grade pathway 
of BC and may be useful in the 
development of prognostic markers. 


Fasching 23 


2012 


Human Molecular 
Genetics 


22532573 


T0X3 (TOX high 
mobility group box family 
member 3) 


With the exception of rs3803662 
(T0X3), there was no evidence that 
any of the SNPs associated with BC 
susceptibility were associated with 
BC survival. 


Huang 24 


2013 


Cell and Bioscience 


23497677 


MACCI (metastasis 
associated in colon 
cancer 1 ) 


Clinicopathologic 
features 


The over expression of MACCI in BR 
is significantly correlated with adverse 
clinicopathological features, including 
metastasis and patient survival. 
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I st author 



Year Journal 



Pubmed 
ID 



Cancer biology 
underpinning 



Related clinical 
indicator 



Major contribution/conclusion 



Yang 2: 



2013 PLoSOne 



23441 166 



Mediation of 
transcription factor 
GRHL2 (grainyhead-like 
2 [Drosophila]) on its 
targets is prognostic 
in BC 



Histological grade 



Proposed the RXA-GSP (relative 
expression analysis with gene-set 
pairs) method, shows promise as 
both a valid prediction model as 
well as high potential for clinical 
utility. 



Nagata 26 



20 1 4 Breast Cancer 



22528804 Induced pluripotent 
stem cell inducing 
factors 



Strong expression of NANOG is an 
indicator of poor prognosis for BC 
patients, whereas KLF4 is a favorable 
prognostic indicator. 



Note: Each key word (genomic, transcriptional, epigenetic, sequence, novel) respectively together with "breast cancer" and "prognostic indicator" was searched in PubMed, 
from Jan 2005-July 2013. 

Abbreviations: BC, breast cancer; ER, estrogen receptor; PAM, prediction analysis of microarray; RXA-GSP, relative expression analysis of gene set pair; SNPs, single- 
nucleotide polymorphisms. 
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